15 research outputs found

    Speaker-independent emotion recognition exploiting a psychologically-inspired binary cascade classification schema

    No full text
    In this paper, a psychologically-inspired binary cascade classification schema is proposed for speech emotion recognition. Performance is enhanced because commonly confused pairs of emotions are distinguishable from one another. Extracted features are related to statistics of pitch, formants, and energy contours, as well as spectrum, cepstrum, perceptual and temporal features, autocorrelation, MPEG-7 descriptors, Fujisakis model parameters, voice quality, jitter, and shimmer. Selected features are fed as input to K nearest neighborhood classifier and to support vector machines. Two kernels are tested for the latter: Linear and Gaussian radial basis function. The recently proposed speaker-independent experimental protocol is tested on the Berlin emotional speech database for each gender separately. The best emotion recognition accuracy, achieved by support vector machines with linear kernel, equals 87.7%, outperforming state-of-the-art approaches. Statistical analysis is first carried out with respect to the classifiers error rates and then to evaluate the information expressed by the classifiers confusion matrices. © Springer Science+Business Media, LLC 2011

    Extraction, Analysis and Synthesis of Fujisaki model Parameters

    No full text

    A Study on the Perception of Tone and Intonation in Sesotho

    No full text
    Please help us populate SUNScholar with the post print version of this article. It can be e-mailed to: [email protected] En Elektroniese Ingeni

    Directions for the future of technology in pronunciation research and teaching

    Get PDF
    Contains fulltext : 199273.pdf (publisher's version ) (Open Access)25 p

    Audio-visual expressions of attitude: How many different attitudes can perceivers decode?

    No full text
    Mixdorff H, Hönemann A, Rilliard A, Lee T, Ma MKH. Audio-visual expressions of attitude: How many different attitudes can perceivers decode? SPEECH COMMUNICATION. 2017;95:114-126.Based on the paradigm by Rilliard et al. we collected audio-visual expressions of attitudes such as arrogance, irony, sincerity and politeness in German. In the experimental design subjects are immersed in sixteen different communicative situations in which they are supposed to portray a certain attitude in a short dialog. Attitudes can be propositional, that is, reactions to a factual situation and/or social, that is, with respect to the relationship with the collocutor. Furthermore, attitudes can be of positive or negative valence or neutral. Undeniably there is a large repertory of subtle differences in the way certain talkers express certain attitudes. The important question is, however, whether collocutors either from the same language or a different one can actually decode these attitudes reliably. On that account we carried out three perceptual experiments in which we presented our recordings of the portrayed attitudes audiovisually, audio -only and video -only. In the first study, German perceivers rated the expressions given the intended attitude, in the second study, they had to choose the most suitable in a choice of five attitudes, and in the third study raters were able to assign freely the term best matching each attitudinal expression. This last experiment was recently replicated by native speakers of Cantonese in Hong Kong. The current article reviews and reevaluates the results from the first three experiments with the German subjects under the premise that perceivers actually have a more limited set of attitudinal registers which they can reliably draw on. This means that expressions can be sorted into a much smaller number of categories than the projected sixteen. In addition we compare and contrast these resulting clusters with the new data from the Cantonese speaking group. Our results indicate indeed a small number of readily decoded attitudes forming four clusters depending on the experiment design" - which are also distinct acoustically. Clusters from the statistical analysis are very similar for the German and the Cantonese perceivers and overlap with basic emotions. This result suggests that expressions of attitudes with low identification rates are more complex to decode and require more pragmatic information, that is, more contextual and possibly idiosyncratic information to be interpreted correctly. (C) 2017 Elsevier B.V. All rights reserved

    Entwicklung einer Prosodiesteuerung fuer die Sprachsynthese in hoher Qualitaet zum Einsatz in Text-to-Speech-Systemen Abschlussbericht

    No full text
    SIGLEAvailable from TIB Hannover: F99B118 / FIZ - Fachinformationszzentrum Karlsruhe / TIB - Technische InformationsbibliothekDeutsche Forschungsgemeinschaft (DFG), Bonn (Germany)DEGerman
    corecore